FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem
نویسندگان
چکیده
FlipFlop implements a fast method for de novo transcript discovery and abundance estimation from RNA-Seq data. It differs from Cufflinks by simultaneously performing the transcript and quantitation tasks using a penalized maximum likelihood approach, which leads to improved precision/recall. Other softwares taking this approach have an exponential complexity in the number of exons in the gene. We use a novel algorithm based on network flow formalism, which gives us a polynomial runtime. In practice, FlipFlop was shown to outperform penalized maximum likelihood based softwares in terms of speed and to perform transcript discovery in less than 1/2 second even for large genes.
منابع مشابه
Efficient RNA isoform identification and quantification from RNA-Seq data with network flows
MOTIVATION Several state-of-the-art methods for isoform identification and quantification are based on [Formula: see text]-regularized regression, such as the Lasso. However, explicitly listing the-possibly exponentially-large set of candidate transcripts is intractable for genes with many exons. For this reason, existing approaches using the [Formula: see text]-penalty are either restricted to...
متن کاملBIRS/Banff 15w5142 - Statistical and Computational Challenges In Bridging Functional Genomics, Epigenomics, Molecular QTLs, and Disease Genetics
Monday 9:15am-9:50am Laurent Jacob Efficient RNA isoform identification and quantification from RNA-Seq data with network flows Several state-of-the-art methods for isoform identification and quantification are based on l1regularized regression, such as the Lasso. However, explicitly listing the possibly exponentially large set of candidate transcripts is intractable for genes with many exons. ...
متن کاملDifferenced-Based Double Shrinking in Partial Linear Models
Partial linear model is very flexible when the relation between the covariates and responses, either parametric and nonparametric. However, estimation of the regression coefficients is challenging since one must also estimate the nonparametric component simultaneously. As a remedy, the differencing approach, to eliminate the nonparametric component and estimate the regression coefficients, can ...
متن کاملGreedy algorithms for prediction
In many prediction problems, it is not uncommon that the number of variables used to construct a forecast is of the same order of magnitude as the sample size, if not larger. We then face the problem of constructing a prediction in the presence of potentially large estimation error. Control of the estimation error is either achieved by selecting variables or combining all the variables in some ...
متن کاملPivotal estimation via square-root Lasso in nonparametric regression
We propose a self-tuning √ Lasso method that simultaneously resolves three important practical problems in high-dimensional regression analysis, namely it handles the unknown scale, heteroscedasticity and (drastic) non-Gaussianity of the noise. In addition, our analysis allows for badly behaved designs, for example, perfectly collinear regressors, and generates sharp bounds even in extreme case...
متن کامل